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ABSTRACT 

■ We present a method to make predictions with sets of correlated data values, in this 
O ■ case QSO flux spectra. We predict the continuum in the Lyman-a forest of a QSO, from 

, 1020 - 1216 A, using the spectrum of that QSO from 1216 - 1600 A. We find correlations 

between the unabsorbed flux in these two wavelengths regions in the HST spectra of 50 
QSOs. We use principal component analysis (PCA) to summarize the variety of these 
spectra and we relate the weights of the principal components for 1020 - 1600 A to the 

■ weights for 1216 - 1600 A, and we apply this relation to make predictions. We test the 
method on the HST spectra, and we find an average absolute flux error of 9%, with a 

. range 3 - 30%, where individual predictions are systematically too low or too high. We 

r*** . mention several ways in which the predictions might be improved. 

m ■ 

Subject headings: methods: data analysis - methods: statistical - techniques: spectro- 
' scopic - quasars: absorption lines - quasars: emission lines - intergalactic medium 

° ' 

^ : 

T^' 1. INTRODUCTION 

o ■ 

We would like to have an accurate and objective way to find the continuum level in QSO 
spectra, because these levels are required to measure the amount of absorption. The uncertainty in 
the continuum level is often one of the largest uncertainties in the studies of intergalactic medium 
^ . (IGM, e.g. Croft et al. 2002), and the precise continuum shape is required for precision measurement 

of absorption lines, such as the measurement of D/H (Kirkman et al. 2003; Suzuki et al. 2003). 

Standard methods of estimating the continua in the Lya forest region of QSO spectra are 
frequently unsatisfactory. For redshifts 2 < z < 4, the standard way to find a continuum level is 
to fit a smooth curve over the peaks of the flux in the Lya forest. This method works well, giving 
< 2% errors in the continuum level in high resolution spectra (e.g. 8 km s^ 1 FWHM) with high 
S/N (e.g. 100 per 2 km s , Kirkman et al. 2003). However, the method fails when there are few 
pixels that we can clearly identify as unabsorbed continuum, and this lack of continuum information 
is common in low S /N spectra, in low resolution spectra where lines blend together, and at higher 
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redshifts where the Lya forest absorbs more than a few percent at all wavelengths. In fact, by 
redshifts z > 6 the complete Gunn-Peterson trough (Becker et al. 2001; Djorgovski et al. 2001; 
Fan et al. 2001) makes it impossible to directly measure the continuum. Instead, at high redshifts, 
especially z > 4, it is common to approximate the continuum in the Lya forest with a power-law 
extrapolation from wavelengths larger than Lya (Telfer et al. 2002; White et al. 2003). But the 
continuum in the Lya forest is not a power law, because the wings of the Ly-/3-0 VI emission line 
and especially the Lya line, extend far into the Lya forest and there exist weak emission lines 
especially near 1073 and 1123 A (Zheng et al. 1997; Vanden Berk et al. 2001; Bernardi et al. 2003). 

We might be able to predict the unabsorbed flux in the Lya forest if it is correlated with the 
unabsorbed flux at other wavelengths. Here, the unabsorbed flux includes both continuum and 
emission lines, but not the random intervening absorption. We measure correlations in a set of 
QSO spectra (hereafter the training set), that cover both 1020 A < A < 1215 A (hereafter the 
blue side) and 1216 A < A < 1600 A (hereafter the red side). We will use the red side spectra of 
individual QSOs to make predictions of their blue sides. 

We use Principal Component Analysis (PCA) to summarize the information in the QSO spec- 
tra. PCA seeks to reduce the dimensionality of large data sets and is widely used in astronomy. 
(Whitney 1983; Kanbur et al. 2002; Efstathiou 2002). Francis et al. (1992) applied PCA to the 
LBQS (Hewett et al. 1995, 2001) optical spectra of QSOs to give an objective classification scheme, 
and they showed that any normalized QSO spectrum, (ft (A), is well represented by a reconstructed 
spectrum, Tj )m (A), that is a weighted sum of m principal components, 



3=1 

where i refers to a QSO, //(A) is the mean of many QSO spectra, £j(A) is the j th principal compo- 
nent, and Cij is its weight. Instead of classifying QSO spectra, we use PCA to make predictions. 

In §2 we describe HST spectra that we use for the training set. We show the correlations in 
the QSO spectra and results of the PCA in §3. In §4 we show how we make predictions and we 
discuss their accuracy. 



For the training set we use UV spectra of low redshift (z < 1) QSOs because they have little 
absorption and we can clearly see their continua levels. Here we describe these spectra, the criteria 
that we use to select them, and the corrections that we make. 

We use a subset of the 334 high resolution Hubble Space Telescope (HST) Faint Object Spec- 
trograph (FOS) spectra collected and calibrated by Bechtold et al. (2002). This sample includes 
all of the high resolution QSO spectra from the HST QSO Absorption Line Key Project (Bahcall 
et al. 1993, 1996; Jannuzi et al. 1998). The gratings chosen are G130H, G190H, and G270H, and 
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2. QSO SPECTRA AND THEIR CORRECTION 
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their spectral resolution is R ~ 1300. Bechtold et al. (2002) identified both IGM and ISM lines in a 
uniform manner and they applied Galactic extinction corrections using the Galactic reddening map 
of Burstein & Heiles (1982) and the Milky Way reddening curve of Cardelli, Clayton, h Mathis 
(1989). 

We select QSO spectra by wavelength coverage and S/N, and we remove a few QSOs with 
peculiar spectra. We reject QSOs that do not have complete coverage from 1020 A to 1600 A. This 
range covers from the Ly/3 + O VI emission line blend to the C IV emission line. A larger range 
would help reveal the shapes of the QSO continua, but we would then have fewer QSOs in the 
sample. 

We reject QSOs that did not have an average S/N > 10 per binned pixel (0.5 A in the rest 
frame) from 1050 A to 1170 A. We are interested in the intrinsic variation of the QSO spectra 
against the mean spectrum. Photon noise adds variation, masks the intrinsic variations, and alters 
the primary principal components. Before we removed the low S/N spectra, we found that some 
principal components were largely reproducing the photon noise of the spectra with unusually low 
S/N. 

We remove QSOs with Broad Absorption Lines and Damped Lya system because we are unsure 
where to place their continua. We also remove Q0219+4248 and Q0906+4305 whose emission line 
features are extremely weak. These removals make our sample not representative of all QSO spectra. 

We end up using the spectra of 50 QSOs that we list in Table 1. The mean redshift is 0.58, 
with a standard deviation of 0.27 and a range from 0.14 to 1.04. The average S/N is 19.5. 

We will represent the spectra of all 50 QSO by fitted smooth curves that reduce the effects of 
photon noise and interpolate over the absorption lines. To find the smooth curves, we mask the 
absorption lines which Bechtold et al. (2002) identified in both the blue and red sides. Then, for 
every 50 A interval, with 20 A overlaps, we fit Chebyshev polynomials and we choose the order 
of the polynomials so that the reduced x 2 becomes close to unity. The order is about 4 ~ 6 if no 
strong emission line lies in that interval. In intervals which include strong emission lines such as 
Lya and C IV, the order becomes 30 ~ 40. For a few QSOs we made further adjustments by hand. 

In Table 1 we give emission redshifts that we measure from the peaks of the Lya emission 
lines. In the rest frame the emission line peaks align, and the asymmetric profiles become a part 
of variance in the set of spectra. If we do not use the peak of the Lya line, but instead we cross- 
correlate with a template of known redshift, we find that we need extra principal components to 
reconstruct the emission lines. Once we obtain the redshifts, we shift the spectra to the rest frame, 
and we rebin them into 0.5 A pixels. We then have 1161 pixels of flux data values per QSO in the 
range 1020 A - 1600 A. 

Since we are interested in the relative spectrum shape, we throw away absolute flux information. 
We find the average flux in 21 pixels around 1280 A, and we normalize all spectra to unit flux at 
these wavelengths, far from any strong emission lines. 
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3. PRINCIPAL COMPONENT ANALYSIS OF QSO SPECTRA 

We calculate the correlation of the fluxes at different wavelengths to see how different parts of 
the typical QSO spectrum are related. In Figure 1, we see the 1161 x 1161 correlation matrix R 
with elements 

\ \ 1 (g»(A m ) -/i(A m )) (gj(A ra ) - //(A ra )) 

H(A ra ,A n )- 17 - T ^ <Xm) a(Xn) , (2) 

where qi(X) is the continuum fitted and normalized spectrum for the i th QSO, N is the total 
number of QSO spectra, a m and a n are the standard deviations of the flux in the mth and nth 
wavelength bins, X m and X n respectively. 

We find moderate correlation, about 0.2 ~ 0.6, between the red and blue continua. The 
correlation between the emission lines, 0.8, is much stronger, and hence we expect that the emission 
lines in the red side will give good predictions for those in the blue side. 

We can calculate the covariance matrix V for the 50 QSOs as: 

1 N 

V(A m , A n ) = — _ - ^2 (qi(X m ) - /i(A m )) (%(A„) - £t(A n )) ■ (3) 
i=i 

In Figure 2 we see the covariance is relatively small in the continuum but large in the emission lines, 
meaning that the emission lines vary a lot from QSO to QSO. The peaks near 1073 and 1123 A 
probably correspond to the weak emission lines in the Lya forest that Bernardi et al. (2003) discuss. 

We can find the principal components by decomposing the covariance matrix V into the 
product of the orthonormal matrix P which is composed of eigenvectors, and the diagonal matrix 
A containing the eigenvalues: 

V = P X AP. (4) 

We call the eigenvectors, the columns of the matrix P, the principal components. The principal 
components are ordered according to the amount of the variance in the training set that they can 
accommodate, such that the first principal component is the eigenvector which has the largest 
eigenvalue. 

Let us quantitatively assess how well we can reconstruct the QSO spectra using a certain 
number of the components. We find the weight Cjj of the jth principal component for QSO spectrum 
<7i(A) from: 

/•1600A 

Cij= / . (<fc(A) - A*(A)) £j(A) d\. (5) 

J 1020A 

When we use the first m components, we get the reconstructed spectrum r^ m . The £j(A) look similar 
to QSO spectra, but with more structure at the wavelengths of the emission lines. Examples of 
principal components and reconstructions, which are very similar to ours, are given by Francis et al. 
(1992). 
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We now introduce the accumulated residual variance fraction 5E,^ m : 
/•1600A / /-1600A 

SE itm = o (r i>m (A) - %(A)) 2 dA / / o (<&(A) - n(X)) 2 dX. (6) 

J1020A / J1020A 

This quantity measures the square of the difference of a reconstructed spectrum from the continuum 
fitted QSO spectrum, in units of the square of the difference between that QSO and the mean. 
Hence 5E,^ m decreases from 1 to near zero as we add components to the reconstruction. The m in 
8Ei^ m tells us that we have used the first m components in the reconstruction rj im (A). In Table 
2, we list the mean < SE m >= (1/N) Yli=i &Ei^ m , averaged over all 50 QSOs. The first three 
component takes 77% of the residual, and the first 10 components absorb about 96%. Francis et al. 
(1992, their Fig. 4) analyzed the LBQS set of QSO spectra (Hewett, Foltz, & Chaffee 1995, 2001). 
Using a different statistic that accounts for photon noise, different wavelengths, and including BAL 
QSOs and all absorption lines, they found that the first three components accounted for 75% of the 
variance, and the first 10 components 95%. We also analyzed the LBQS spectra, kindly provided 
by Paul Francis and Paul Hewett, to confirm that our implementation of the PCA matched theirs 
given our wavelength range and selection criteria. The residual variance decreases in a different way 
for each QSO because the contributions of the components differ. However, on average, the rate of 
reduction slows after third component and saturates around the 10th component. The components 
greater than about 10th look noisy and carry little information. In the following discussion, we use 
up to the first 10 principal components. 

4. PREDICTING SPECTRA 

Our goal is to predict the continuum of a QSO in the Lya forest, the blue side, using a spectrum 
of wavelengths larger than Lya emission, the red side. In §4.1 we describe how we relate the blue 
and red side continua, and in §4.2 we give a general recipe to make predictions. 

4.1. METHODS 

Unfortunately, we do not have enough QSO spectra for both a training set and a separate set 
of spectra that we can use to test our predictions. Hence, we also use the training set for the tests. 
When we make a prediction for a QSO, we use a set of principal components which we generated 
without using any spectra of that QSO. We will call this the bootstrap method, and we will use 
it in most of the remainder of this paper. When we omit different QSOs, the first five principal 
components change by a few percent, most noticeably at the wavelengths of the emission lines. 
If we do not omit the QSO of interest, the complete set of principal components contains all the 
information on each QSO, including the continuum fitting and photon noise. We found the noise 
features could identify a QSO, giving weights with unrealistically high precision. 
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We chose three steps to quantify the relationship between the red and blue sides of the spectra 
in the training set. 

• Step 1: Find the first m principal components £j(A) and their weights, Cjj. We use all the 
blue and red side wavelengths, 1020 A to 1600 A, and we use the bootstrap method to give 
a slightly different set of £j(A) for each QSO i. 

• Step 2: Repeat step one, using only the red wavelengths 1216 A to 1600 A. We again keep 
the first m principal components, (j(X) and their weights, d^, which are similar to those from 
step one. 

• Step 3: Solve linear equations to find a projection matrix which relates Cij and dij. 

We can write the weights in the N x m matrix form C = Cij and similarly for D. We would like 
to find the mx m projection matrix X = which translates weights found on the red side to the 
weights for the whole spectrum: 



C = D X. (7) 

We have N = 50 QSOs, and we keep m = 10 components. We then have more equations 
(N) than unknowns (m) and we wish to find the least-squares solution to this over-determined set 
of linear equations. The solution matrix X can be found via the Singular Value Decomposition 
technique (Press et al. 1992). 



4.2. MAKING A PREDICTION 

We are now ready to make a prediction of the Lya forest continuum of any QSO spectrum, 
that need not be in the training set, provided we have the red side of its spectrum. We obtain 
predictions in three steps, that are similar to those presented above to find X. 

• Step A: Find the weights for the red spectrum, 

/■1600A 

bij= / ( gi (X) - fi(X)) d\. (8) 

J1216A 

The bij will be like the dij in step 2 above, except that the principal components could be 
different. If the QSO is not part of the training set, then the (j(X) can be derived using the 
entire training set. 

• Step B: Translate the weights from the red side bij, to weights for whole spectrum, using 

m 

dij = ^2hk x kj . (9) 
fc=i 
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This resembles Equation (7), except that we now know X and we are deriving the that 
play the roles of the cij of step 1. 

• Step C: Make a predicted spectrum, 

m 

Pi, m ( x ) = MA) + a a ( 10 ) 

The predicted spectrum Pi, m (A) differs from the reconstruction rj ;m (A) because the reconstruction 
uses weights derived from the blue and red sides of the spectrum, using Equation (1), where as the 
predictions use weights derived from the red part of the spectrum alone, using Equations (8) and 
(9). 

We provide the two sets of principal components, £j(A) and (j(X), and the projection matrix 
X, so the readers can make their own predictions. 



4.3. PREDICTION ACCURACY 

In Figure 3, we show two relatively successful and two unsuccessful predictions, Pi,io(A), to- 
gether with the corresponding continuum fitted QSO spectra, (ft (A). When the prediction fails, the 
predicted spectrum is systematically either too low or too high, but with no preference. 

We assess the errors on our predictions using the absolute fractional flux error: 

,A 

\SF iim \= / 

For the blue side, Ai = 1050 A and A2 = 1170 A, avoiding the Ly/3 and Lya emission lines, and 
for the red side, Ai = 1216 A and A 2 = 1600 A. In Table 1, we list the absolute fractional flux 
errors for each QSO, when we use m = 10 components. In Table 2, row (c) and (e), we list 

< \SF m \ >= (l/^V)X)z=i \dFi,m\, the mean of the absolute fractional flux error from all 50 QSOs, 
and we show how this mean changes with m. For comparison, in rows (b) and (d), we also list 

< \SF m \ > obtained by reconstruction when we replace Pi, m (A) in Equation (11) by rj im (A) from 
Equation (1). These rj jm (A) use principal components derived from the blue plus red sides. We 
find similar values if we use principal components from the red side alone. 

The advantage of using more components is different for the reconstruction, the red side 
prediction, and the blue side prediction. For the reconstruction and the red side prediction, more 
components always improve the fit (Table 2, row (b), (c), (d)), although the reduction of the error 
is small after about the 10th component, and we think the remaining 3% absolute fractional flux 
error is similar in size to the error of the continuum fitting (§2). 

On the other hand, for the blue side prediction, adding more components does not reduce the 
mean error. We think the reason is related to the properties of the first few principal components. 



Pi,m(A) - qj(X) 
ft(A) 



dX 



A2 



dX. 
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Only the third, fourth and fifth principal components have a significant slope. If we choose appro- 
priate weights for these components, we are likely to make accurate predictions, and adding more 
components can reduce the residuals (Figure 3, panel (a), (b)). But if we choose inappropriate 
weights, there is no way to correct the slope (Figure 3, panel (c), (d)). Hence the main error in our 
predictions is a systematic slope error which makes the blue side prediction too high or too low. 

Although our predictions give small errors for some QSOs, they give huge errors for others. 
With 10 components, the mode of the \SFiq\ distribution is 3%, the median is 8%, and the range 
is about 30%. We predict the blue continua of 28 QSOs out of 50 QSOs to \SFi\ < 10%, but we do 
not know which QSOs will have these small errors. We find that many of the QSOs that give the 
largest errors have absorption or unusually low S/N on the red side of the Lya line (1216 - 1240 A, 
noted in Table 1) which makes the continuum level hard to estimate. This region is significant 
because it contains much of the variance and hence will have a large effect on the weights. However 
in others cases there is little or no absorption, and the slope of the spectrum appears to change as 
we cross the Lya emission line. We do not in general see any correlation between the errors in the 
red and blue side predictions. 

We introduce a third error statistic, the fractional flux error, 



to help us estimate the error in the integrated flux. This error is related to flux decrement statistic, 
D a (Oke & Korycansky 1982; Bernardi et al. 2003), which has been used to calibrate numerical 
simulations of the IGM (Croft et al. 2002). In Table 1 we see that has a similar magnitude 

to \8Fi : io |, also for the blue side, because the predictions tend to be too low or too high across 
the entire blue side. The mean of 5Fi tm for all 50 QSOs, < 5F m > , is zero by definition, at each 
wavelength, when we use m = 0, because £>i,o(A) = A*(A). When m > 0, the statistic < SF m > 
represents the bias in the predictions, and in Table 2 we see that it remains within a few percent 
of zero for all m < 10. 

In Figure 4 we show an example of the prediction of the Lya forest continuum of a higher 
redshift QSO from the Sloan Digital Sky Survey (York et al. 2000; Stoughton et al. 2002) for which 
the continuum is unobservable because of the large amount of absorption. To make this prediction, 
which looks acceptable, we use the red side of its spectrum. 



Although the error in the levels of many of the predictions was unexpected, we can think 
of many possible explanations, including the intrinsic QSO spectra, calibration errors, and the 
method. 
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4.4. 



ERRORS AND DIFFERENT METHODS 



The intrinsic slope of the continuum may be changing at the UV wavelengths we consider 
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(Zheng et al. 1997; Telfer et al. 2002). The UV flux from QSOs is black body radiation from the 
accretion disk around the central super massive black hole (Malkan 1983; Sun Sz Malkan 1989). If 
we pass over the peak of the black body continuum, the slope changes rapidly, and we will not have 
enough wavelength coverage to follow the change of the slope. 

There are many possible errors in the calibration of the spectra, especially the Galactic ex- 
tinction correction (Fitzpatrick 1999). At our wavelengths, the extinction increases rapidly as the 
wavelength drops. If we underestimate color excess E(B-V) for a z = QSO by 0.01 magnitudes, 
we decrease the flux at 1600 A by 7%, and at 1020 A by 14% . 

We have presented only one of many ways of predicting spectra. We attempted some other 
less successful schemes before we arrived at this method. For instance, we fitted the red side by 
minimizing the x 2 of £j(A) against qi(X) to get a^-, but we found the blue side prediction is then 
unstable. Different combinations of the principal components give similar \ 2 on the red side, but 
their blue sides can have a large variety. 

We also experimented with different ways of normalizing the spectra, both near 1450 A and us- 
ing the entire red side, and we obtain similar results. We tried attempting to remove the slopes from 
spectra, by fitting a straight line to the red sides, but these lines were not adequately determined. 

The predictions given here could be improved in several ways. We should first seek improved 
methods, which might not involve PCA. If we had spectra of many more QSOs we would be less 
subject to the distortions and noise in individual spectra, and we could test the predictions on 
spectra that were not in the training set. Higher resolution and higher S/N spectra would help to 
reveal the weak emission features, and reduce the errors in the initial continuum fit. Extending the 
wavelength range on the red side may help identify the slope of the continuum for some QSOs, but 
for others we found that a reduced wavelength range, from 1216 - 1400 A gave better predictions, 
because this restricted range has a stronger correlation with the flux on the blue side. We have 
also made predictions with the red side restricted to 1280 - 1600 A, to avoid the large flux errors 
that can occur when there are absorption lines in the Lya and N V emission lines. We found that 
there was no significant change in < |<5F m | >. 

This work was funded primarily by NASA grant NAG5-9224, and in part by NAG5-13113, 
and NSF grants AST-9900842 and AST-0098731. We are especially grateful to Paul Francis who 
provided the code and LBQS spectra that he had analyzed in his 1992 paper, and who answered 
many questions. Paul Hewett kindly sent the error arrays for those spectra. Wei Zheng and Buell 
Januzzi kindly provided copies of HST QSO spectra that we used before we located the invaluable 
collection of HST spectra posted to the web by Jill Bechtold. We thank Carl Melis and Angela 
Chapman for their careful reading of this manuscript. 
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Table 1. Statistical Quantities for 50 QSOs' 



QSO 


z 


Red Side 


1**10 1 
Blue Side 


SF W 
Blue Side 


q0003+1553 


0.450 


3.3 


5.2 


4.1 


q0026+1259 


0.145 


5.2 


6.7 


5.4 


q0044+0303 b 


0.623 


4.6 


27.3 


-27.1 


q0159-1147 b 


0.669 


2.1 


3.1 


2.8 


q0349-1438 


0.615 


2.2 


14.7 


-14.5 


q0405-1219 


0.572 


2.3 


4.4 


-4.3 


q0414-0601 


0.774 


3.5 


8.7 


8.7 


q0439— 4319 


0.593 


6.0 


5.3 


5.3 


q0454-2203 b 


0.532 


3.0 


31.4 


-30.6 


q0624+6907 b 


0.367 


4.7 


29.1 


-28.8 


q0637-7513 


0.652 


2.7 


18.6 


18.6 


q0923+3915 


0.698 


2.5 


4.5 


4.2 


qUy4 { + 3940 


0.205 


2.8 


15.3 


15.5 


qOyOO+4129 


0.233 


5.0 


16.8 


17.4 


q0954+5537 b 


0.901 


3.8 


2.7 


2.1 


q0959+6827 


0.767 


3.5 


4.4 


— 2.3 


ql001+2910 b 


0.328 


5.3 


7.9 


7.8 


qlUU / +414 I 


0.612 


3.4 


8.8 


9.1 


qll00+7715 b 


0.312 


3.9 


31.0 


31.0 


n i 1 (1/1 _L 1 «A A 


0.630 


3.1 


2.7 


-0.4 


qlll5+4042 b 


0.154 


5.6 


14.5 


14.7 


qll37+6604 b 


0.646 


3.1 


2.3 


— 2.3 


qll48+5454 


0.970 


3.0 


12.3 


12.5 


„1 oik i nccK 
ql^ilo + UOOO 


0.332 


9.4 


4.7 


— 3.3 


ql229 — 0207 b 


1.041 


3.1 


7.0 


6.9 


q!248+400 ( 


1.027 


4.2 


6.4 


6.3 


ql252 + 1157 b 


0.868 


3.5 


6.8 


6.8 


ql259+5918 b 


0.468 


3.0 


13.5 


-13.4 


ql317+2743 


1.009 


1.5 


10.8 


10.8 


ql320+2925 b 


0.947 


3.1 


4.3 


4.2 


ql322+6557 


0.168 


4.8 


13.5 


13.3 


ql354+1933 


0.720 


4.8 


8.1 


8.3 


ql402+2609 b 


0.165 


3.6 


10.6 


10.9 


ql424-1150 b 


0.804 


4.8 


10.6 


10.6 


ql427+4800 b 


0.222 


4.1 


25.9 


26.1 


ql444+4047 b 


0.266 


4.1 


15.4 


-15.1 


ql538+4745 b 


0.768 


4.5 


23.4 


-23.2 


ql544+4855 


0.400 


7.4 


13.3 


-12.8 


ql622+2352 b 


0.926 


4.5 


2.7 


0.5 


ql637+5726 


0.750 


2.5 


3.6 


3.6 


ql821+6419 b 


0.296 


5.0 


12.9 


-13.2 


ql928+7351 b 


0.302 


2.7 


6.1 


12.0 


q2145+0643 


1.000 


3.7 


2.5 


0.4 


q2201+3131 b 


0.296 


3.1 


3.2 


3.3 


q2243-1222 


0.626 


3.6 


2.6 


2.2 


q2251+1120 b 


0.325 


6.8 


27.6 


-27.1 


q2251+1552 


0.856 


2.7 


4.1 


2.2 


q2340-0339 b 


0.894 


3.1 


24.7 


24.6 


q2344+0914 b 


0.671 


7.8 


5.5 


4.9 


q2352-3414 


0.707 


3.6 


3.9 


1.0 



a Wc used 10 principal components for the error values, which 
we show multiplied by 100. 

b The spectrum contains absorption or unusual photon noise in 
the region 1216 — 1240 A that might have lead to an inaccurate 
continuum level. 
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Table 2. Mean Statistical Quantities for 50 QSOs' 



Component: 


m 




1 2 3 4 5 


6 


7 


8 


9 


10 




Blue plus Red sides: 1020 A < A < 1600 A 


(a) Reconstruction 


< 5E rn 


> 


50.1 31.5 22.8 15.9 11.8 


8.7 


6.7 


5.7 


4.6 


3.7 


Red Side: 1216 A < A < 1600 A 


(b) Reconstruction 

(c) Prediction 


< \8F m \ 

< \SF m \ 


> 
> 


8.9 7.5 7.7 5.9 5.1 
8.7 6.7 6.5 5.4 5.6 


4.4 
4.4 


3.7 
4.3 


3.7 
3.9 


3.6 
4.1 


3.3 
3.1 


Blue Side: 1050 A < A < 1170 A 


(d) Reconstruction 

(e) Prediction 

(f) Prediction 


< \SF m \ 

< \SF m \ 
<SF m 


> 
> 
> 


10.5 6.9 5.5 4.1 4.0 
10.4 9.6 9.7 9.2 9.6 
-1.4 -1.0 -1.2 -1.0 -1.2 


3.9 
9.7 
-1.2 


4.1 
9.7 
-1.2 


3.3 
10.7 
-2.2 


3.7 
9.9 
-1.4 


3.2 
9.4 
-1.3 



a We have multiplied the values, which are the means for all 50 QSOs, by 100. 
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Wavelength (A) 
1100 1200 1300 1400 1500 1600 



t r-n 1 r— , 1 1 pi 1 r— , |, I i 1 1 1 1 1 1 1 1 1 m r 




-1 -0.5 0.5 



Fig. 1. — Visualization of the 1161 x 1161 correlation matrix (Equation (2) ) created from the 
continua fitted to the 50 QSO spectra, normalized near 1280 A. The numerical values, encoded in 
the lower bar, depend on the normalization wavelength, and the features near 1280 A are artifacts of 
the normalization. The correlation is unity by definition on the diagonal, and the upper and lower 
triangles are identical reflections about this diagonal. Top and left panels show the wavelength range 
and the mean spectrum of the 50 QSOs. Emission lines are shown at effective wavelengths from 
Wills et al. (1995). We see that the correlation is strong between the emission lines, for example 
the horizontal row at the wavelength of C IV shows correlations 0.4 - 0.9 at the wavelengths of 
Ly/3 and Ly<x The continuum between Ly/3 and Lya is moderately correlated (0.1 - 0.6) with that 
between Lya and Si IV, and somewhat less correlated (0.0 - 0.5) with that from Si IV to C IV. 
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Wavelength (A) 
1100 1200 1300 1400 1500 1600 



t r-n 1 r— , 1 1 pi 1 r— , |, I i 1 1 1 1 1 1 1 1 1 m r 




-0.02 0.02 

Fig. 2. — As Figure 1, except showing the covariance matrix given by Equation (3). The variance 
is largest at the emission line wavelengths: Ly/5 + O VI, 1073 A, 1123 A, Lya, N V, Si II (1263), 
O I + Si II (1306), C II (1335), Si IV + O IV] and C IV. We see a grid of peaks in the variance 
at the intersections of these wavelengths. The variance is zero at 1280 A where we normalized 
the spectra to the same flux. The variance is larger between Ly/3 and Lya than on the red side, 
perhaps because of errors in our continuum fits in the Lya forest or because of intrinsic variations. 
We obtain the principal components by decomposing this covariance matrix into eigenvectors and 
eigenvalues. 
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Fig. 3. — Four examples of our predicted spectra and their accuracy. The top two gave relatively 
small errors, and the bottom two relatively large. Note that the vertical scales are not all the same. 
The panels on the left show in thick lines the original continuum fitted spectra, </j(A). The thin 
lines show the spectra predicted using 10 components (p«,io(A), Equation (10)). We do not show 
any reconstructions ft* (A), to either the red sides or the whole spectra. The panels on the right 
show the corresponding absolute fractional flux error (Equation (11)) averaged over the part of the 
blue (1050 - 1170 A, thin lines) and the whole of the red (thick) sides for each QSO. For both the 
blue and red we show the predictions (Equation (10)) and not reconstructions, which are usually 
better. For the top two QSOs the errors decrease when we use more components, reaching 2 - 6% 
on both the red and blue sides with 10 components. For the bottom two QSOs, the predictions 
reach 3 - 4% on the red side, because we use information from those parts of the spectrum, but 
they are too low or too high by 13 - 23 % on the blue side, primarily because the level is wrong, 
even though the emission line shapes seen excellent. These four QSOs represent the variety found 
in the sample of 50 QSOs. We were unable to predict which QSOs would behave like the top or 
bottom two. 
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Fig. 4. — An example of a continuum predicted (dotted line) using Equations (8) - (10) for a high 
redshift QSO, SDSS J0940+5848 (solid lines). The only information that we use on this QSO is the 
red side of this spectrum, from 1216 - 1600 A. The Lya forest forest absorbs strongly and no pixels 
in this low resolution spectrum reach the predicted continuum level. We suspect the predicted blue 
continuum has the correct shape, including the two emission lines between Ly/3 and Lya, but we 
know that the overall level could be in error by 5 - 30%, as we saw in Figure 3. 



